Clustering Large Scale Data Set Based on Distributed Local Affinity Propagation on Spark
نویسندگان
چکیده
منابع مشابه
Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm
Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for l...
متن کاملDistributed and Incremental Clustering Based on Weighted Affinity Propagation
A new clustering algorithm Affinity Propagation (AP) is hindered by its quadratic complexity. The Weighted Affinity Propagation (WAP) proposed in this paper is used to eliminate this limitation, support two scalable algorithms. Distributed AP clustering handles large datasets by merging the exemplars learned from subsets. Incremental AP extends AP to online clustering of data streams. The paper...
متن کاملLocal and global approaches of affinity propagation clustering for large scale data
Recently a new clustering algorithm called ‘affinity propagation’ (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approac...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملAn Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Database Theory and Application
سال: 2016
ISSN: 2005-4270,2005-4270
DOI: 10.14257/ijdta.2016.9.10.20